-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exec: add templating for RANK and ROW_NUMBER window functions #37282
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should add some lightweight benchmarks to these. Also add the .eg.go files to .gitignore.
Reviewed 5 of 6 files at r1, 5 of 5 files at r2, 5 of 5 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @jordanlewis, @solongordon, and @yuzefovich)
pkg/sql/exec/vecbuiltins/rank_tmpl.go, line 52 at r4 (raw file):
// {{range .}} type rankDense__DENSE_HasPartition__PARTITION_Op struct {
You could make an embedded struct that has all of the data that's repeated in these, to avoid the nasty initialization above.
pkg/sql/exec/vecbuiltins/rank_tmpl.go, line 103 at r4 (raw file):
rankCol := r.batch.ColVec(r.outputColIdx).Int64() sel := r.batch.Selection() if sel != nil {
I would probably try to template sel vs nonsel here as well since there's so much code in the body.
pkg/sql/exec/vecbuiltins/row_number.go, line 59 at r2 (raw file):
// TODO(yuzefovich): I couldn't figure out how to pass the batch as an // argument, so I embedded it into the struct. Is it possible to pass it as // an argument?
You can pass it as an argument to this function, yes. The template "function" can see everything in its surrounding scope - so what I usually do is give the template function an argument of the same name as the equivalent in the outer scope, so it looks like ordinary Go code.
pkg/sql/exec/vecbuiltins/row_number_tmpl.go, line 29 at r2 (raw file):
// {{/* func _NEXT_ROW_NUMBER(hasPartition bool) { // */}}
So I'd give this guy a batch coldata.Batch
argument, and also give nextBodyWithPartition
that same named argument, and things should just work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left TODOs for the benchmarks and for sel vs non-sel cases in rankOp. I want to get it in as is (undertested with no benchmarks) since it's already a pretty big PR.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @jordanlewis, and @solongordon)
pkg/sql/exec/vecbuiltins/rank_tmpl.go, line 52 at r4 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
You could make an embedded struct that has all of the data that's repeated in these, to avoid the nasty initialization above.
Looks better, thanks!
pkg/sql/exec/vecbuiltins/rank_tmpl.go, line 103 at r4 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
I would probably try to template sel vs nonsel here as well since there's so much code in the body.
Also left a TODO for this.
pkg/sql/exec/vecbuiltins/row_number.go, line 59 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
You can pass it as an argument to this function, yes. The template "function" can see everything in its surrounding scope - so what I usually do is give the template function an argument of the same name as the equivalent in the outer scope, so it looks like ordinary Go code.
Indeed, that worked, thanks!
pkg/sql/exec/vecbuiltins/row_number_tmpl.go, line 29 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
So I'd give this guy a
batch coldata.Batch
argument, and also givenextBodyWithPartition
that same named argument, and things should just work.
Done, thanks!
RFAL guys. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto, @jordanlewis, @solongordon, and @yuzefovich)
pkg/sql/exec/execgen/cmd/execgen/rank_gen.go, line 31 at r5 (raw file):
} func (r rankTmplInfo) UpdateRank() string {
Comments on these would be nice. In general I worry that this code will be confusing as it is far removed from where it is used, but that's kind of a limitation of the model in general. I think comments are the best we can do for now.
e094502
to
b95d2f3
Compare
7979fab
to
e339160
Compare
Templates out rankOp into two operators (for dense and non-dense case) and templates out support for PARTITION BY clause for both rankOp and rowNumberOp. The code is undertested and is lacking benchmarks, but that will be addressed soon. Release note: None
TFTR! bors r+ |
36101: roachprod: Minor readme updates r=bobvawter a=bdarnell Use new convenience commands and demonstrate running a workload Release note: None 37282: exec: add templating for RANK and ROW_NUMBER window functions r=yuzefovich a=yuzefovich Templates out rankOp into two operators (for dense and non-dense case) and templates out support for PARTITION BY clause for both rankOp and rowNumberOp. The code is undertested and is lacking benchmarks, but that will be addressed soon. 37412: server/heapprofiler: don't consider "goIdle" memory r=andreimatei a=andreimatei Before this patch we were taking a heap profile once RSS got above 85% of system memory and then we wouldn't take another profile until we go below it and then again above it. The problem with the RSS is that it includes "goIdle" memory (memory allocated from the OS that's not actually in use) ; I've seen it be up to several gigs. With this patch, we now take a profile every time a new high-water mark is reached. The recorded high-water mark is reset once an hour, this way ensuring that one big measurement doesn't stop us forever from taking another profile. It's simpler and it has a chance of catching some further heap increases. The rationale behind the patch is that we want profiles when the heap is large more than when the RSS is large. I'm looking at a case where we took a heap profile when the heap was 4.5 gigs with 2 gigs idle and then never took one again because of how the heuristic works. And then we OOMed when the heap was larger and the idle space was lower, but the RSS was about the same. With this patch, we would have taken a profile at a more interesting time. Release note: None Co-authored-by: Ben Darnell <ben@bendarnell.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Build succeeded |
Templates out rankOp into two operators (for dense and non-dense
case) and templates out support for PARTITION BY clause for both
rankOp and rowNumberOp. The code is undertested and is lacking
benchmarks, but that will be addressed soon.